Spatial orientations of visual word pairs to improve Bag-of-Visual-Words model

نویسندگان

  • Rahat Khan
  • Cécile Barat
  • Damien Muselet
  • Christophe Ducottet
چکیده

This paper presents a novel approach to incorporate spatial information in the bag-of-visual-words (BoVW) model [1, 3] for category level and scene classification. In the traditional BoVW model, feature vectors are histograms of visual words. This representation is appearance based and does not contain any information regarding the arrangement of the visual words in the 2D image space. In this framework, we present a simple and efficient way to infuse spatial information. Particularly, we are interested in explicit global relationships among the spatial positions of visual words. For that we first introduce the notion of Pair of Identical visual Words (PIW) defined as the set of all the pairs of visual words of the same type. Then a spatial distribution of words is represented as a histogram of orientations of the segments formed by PIW. Figure 1 shows an example which gives an intuition to better understand our approach. Our method eliminates a number of drawbacks from the previous approaches [2, 3] by i) proposing a simpler word selection technique that supports fast exhaustive spatial information extraction, ii) enabling infusion of global spatial information, iii) being robust to geometric transformations like translation and scaling. In the conventional BoVW model, each image is represented by a set of local descriptors {d1 . . .dn} extracted from n patches around interest points or regular grids. A visual vocabulary W = {w1,w2,w3,w4 . . .wN} is obtained by clustering a set of descriptors from all the training images. Here, N is a predefined number and the size of the vocabulary. Each patch of the image is then mapped to the nearest visual word according to the following equation:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

Crowded Pedestrian Detection and Density Estimation by Visual Words Analysis

Crowded pedestrian detection and density estimation are very useful and important under transportation environment. In this paper, we present a novel method for crowded pedestrian detection and density estimation through a weighting scheme of bag of visual words model which characterizes both the weight and the relative spatial arrangement aspects of visual words in depicting an image. Firstly,...

متن کامل

Spatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval

It is a challenging and important task to retrieve images from a large and highly varied image data set based on their visual contents. Problems like how to fill the semantic gap between image features and the user have attracted a lot of attention from the research community. Recently, the 'bag of visual words' approach exhibits very good performance in content-based image retrieval (CBIR). Ho...

متن کامل

Egocentric Activity Recognition Using Bag of Visual Words

This paper presents an approach for recognizing activities using video from the egocentric setup. In this approach instead of using intermediate setup like object detection, pose estimation, modeling spatial distribution of visual words is implemented. The interactions are encoded by using Histogram oriented Pairwise Relation named (HOPR) between the visual words, orientations and alignments. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012